Assignment 3: Using logistic regression to classify subjective experience from brain data

Exercises and objectives

  1. Load the magnetoencephalographic recordings and do some initial plots to understand the data
  2. Do logistic regression to classify pairs of PAS-ratings
  3. Do a Support Vector Machine Classification on all four PAS-ratings

EXERCISE 1 - Load the magnetoencephalographic recordings and do some initial plots to understand the data

The files megmag_data.npy and pas_vector.npy can be downloaded here (http://laumollerandersen.org/data_methods_3/megmag_data.npy) and here (http://laumollerandersen.org/data_methods_3/pas_vector.npy)

library(reticulate)
import numpy as np 
import os 
import matplotlib.pyplot as plt

(GROUP) 1) Load megmag_data.npy and call it data using np.load. You can use join, which can be imported from os.path, to create paths from different string segments

data = np.load('/Users/minaalmasi/Documents/Cognitive_Science/Methods_3/methods3_code/data_W8/megmag_data.npy')

(DB) i. The data is a 3-dimensional array. The first dimension is number of repetitions of a visual stimulus , the second dimension is the number of sensors that record magnetic fields (in Tesla) that stem from neurons activating in the brain, and the third dimension is the number of time samples. How many repetitions, sensors and time samples are there?

data.shape # prints (number of repetitions, number sensors, number of time samples)
## (682, 102, 251)

There are 682 repetitions, 102 number of sensors and 251 number of time samples.

(MA) ii. The time range is from (and including) -200 ms to (and including) 800 ms with a sample recorded every 4 ms. At time 0, the visual stimulus was briefly presented. Create a 1-dimensional array called times that represents this.

times = np.arange(-200, 804, 4)

(MA) iii. Create the sensor covariance matrix \(\Sigma_{XX}\): \[\Sigma_{XX} = \frac 1 N \sum_{i=1}^N XX^T\] \(N\) is the number of repetitions and \(X\) has \(s\) rows and \(t\) columns (sensors and time), thus the shape is \(X_{s\times t}\). Do the sensors pick up independent signals? (Use plt.imshow to plot the sensor covariance matrix)

N = len(data) #length of data

cov_temp = np.zeros(shape = (102, 102)) #creating an empty matrix

for i in range(N): #for loop in the range of the data 
  X = data[i,:,:]
  X_T = np.transpose(X)
  cov_temp += np.dot(X, X_T)

#dividing by length of data 
cov = cov_temp / N

## PLOTTING covariance matrix ## 
plt.figure()
plt.imshow(cov)
plt.title("1.1iii: Sensor Covariance Matrix")
plt.colorbar()
## <matplotlib.colorbar.Colorbar object at 0x1278fa130>
plt.show()

From the plot, it appears that the sensors seem to pick up different signals for the most part, since the covariance matrix plot of the sensors has a petrol/navy color in general which corresponds to a covariance of about 0. The yellow color at around 50 however indicates high covariance. This covariance is so high that it distorts the rest of the picture. This may be amended by standardising the data:

from sklearn.preprocessing import StandardScaler 

N = len(data) #length of data

cov_temp = np.zeros(shape = (102, 102)) #creating an empty matrix

for i in range(N): #for loop in the range of the data 
  scaler = StandardScaler()
  X = data[i,:,:]
  X_scaled = scaler.fit_transform(X)
  X_T = np.transpose(X_scaled)
  cov_temp += np.dot(X_scaled, X_T)

#dividing by length of data 
cov_scaled = cov_temp / N

## PLOTTING covariance matrix ## 
plt.figure()
plt.imshow(cov_scaled, vmin=-200, vmax=200) 
plt.title("1.1iii: Scaled Sensor Covariance Matrix")
plt.colorbar()
## <matplotlib.colorbar.Colorbar object at 0x12a6345e0>
plt.show()

By standardising the data and setting the upper limit to a lower value (vmax = 200), the covariance of the sensors become more apparent. Hence not all sensors pick up completely independent signals.

(ADS) iv. Make an average over the repetition dimension using np.mean - use the axis argument. (The resulting array should have two dimensions with sensors as the first and time as the second)

mean_teslas = np.mean(data, axis = 0) #axis = 0 indicates the first dimension (repetitions)

mean_teslas.shape #checking whether we managed to collapse the first dimension
## (102, 251)

(MS) v. Plot the magnetic field (based on the average) as it evolves over time for each of the sensors (a line for each) (time on the x-axis and magnetic field on the y-axis). Add a horizontal line at \(y = 0\) and a vertical line at \(x = 0\) using plt.axvline and plt.axhline

plt.figure()
plt.plot(times, mean_teslas.T)
plt.axvline(color = "black")
plt.axhline(color = "black")
plt.title("1.1v: Average Magnetic Field Across Repetitions")
plt.xlabel("Time (ms)")
plt.ylabel("Magnetic Field (Tesla)")
plt.show()

(MS) vi. Find the maximal magnetic field in the average. Then use np.argmax and np.unravel_index to find the sensor that has the maximal magnetic field.

# MAXIMAL MAGNETIC FIELD IN AVG. #
np.max(mean_teslas)

# SENSOR with THE MAXIMAL MAGNETIC FIELD # 
## 2.7886216843591933e-13
np.unravel_index(np.argmax(mean_teslas), mean_teslas.shape) #first argument is the index of the maximal tesla value, the second argument is the dimensions of the array 
## (73, 112)
times[112] # time in ms 
## 248

The maximal magnetic field in the average is 2.7886216843591933e-13.

With np.unravel_index and np.argmax we get the sensor with the maximal magnetic field to be 73 as given by the coordinate system (73, 112). In the calculation, the time is given in the time-samples (112), but by printing this index from the times vector, we get the time in ms of 248. This number fits the peak of the red sensor in the plot from 1.1v.

(ADS) vii. Plot the magnetic field for each of the repetitions (a line for each) for the sensor that has the maximal magnetic field. Highlight the time point with the maximal magnetic field in the average (as found in 1.1.v) using plt.axvline

## SAVING Sensor 73 ## 
sensor73 = data[:,73,:]

## PLOTTING Sensor 73 ##
plt.figure()
plt.plot(times, sensor73.T)
plt.axvline(times[112], color = "black", linestyle = "--", label = "time at avg. maximal magnetic field")
plt.legend(loc = "upper right")
plt.axvline(color = "black")
plt.axhline(color = "black")

plt.title("1.1vii: Magnetic Field Across Repetitions for Sensor 73")
plt.xlabel("Time (ms)")
plt.ylabel("Magnetic Field (Tesla)")

plt.show()

(MA) viii. Describe in your own words how the response found in the average is represented in the single repetitions. But do make sure to use the concepts signal and noise and comment on any differences on the range of values on the y-axis

In the plot in 1.1vii, we zoom in on the recordings made by sensor 73 for all 682 repetitions (one line for each repetition). On plot 1.1vii, it is possible to see the indication of an signal/effect of sensor 73 as we see that the range of y-values is narrower and more positive at the dotted line.

It is the average of these repetitions that is visible in 1.1v as the red line with the highest peak (sensor 73). Averaging across each repetition cancels the noise and makes the signal more clearly visible.

(GROUP) 2) Now load pas_vector.npy (call it y). PAS is the same as in Assignment 2, describing the clarity of the subjective experience the subject reported after seeing the briefly presented stimulus

y = np.load('/Users/minaalmasi/Documents/Cognitive_Science/Methods_3/methods3_code/data_W8/pas_vector.npy')

(ADS) i. Which dimension in the data array does it have the same length as?

# CHECKING THE length of y #
len(y)

# The dimension it has the same length as #
## 682
len(y) == len(data[:,0,0])
## True

y has the same length as the first dimension repetitions in our data array.

(MA) ii. Now make four averages (As in Exercise 1.1.iii), one for each PAS rating, and plot the four time courses (one for each PAS rating) for the sensor found in Exercise 1.1.v

## FINDING the indexes of the pas ratings ##
pas1 = np.where(y == 1)
pas2 = np.where(y == 2)
pas3 = np.where(y == 3)
pas4 = np.where(y == 4)

## FINDING the mean TESLA values for each pas rating for all time values ##
mean_pas1 = np.mean(sensor73[pas1], axis = 0)
mean_pas1.shape
## (251,)
mean_pas2 = np.mean(sensor73[pas2], axis = 0)
mean_pas2.shape
## (251,)
mean_pas3 = np.mean(sensor73[pas3], axis = 0)
mean_pas3.shape
## (251,)
mean_pas4 = np.mean(sensor73[pas4], axis = 0)
mean_pas4.shape 

## PLOTTING the mean TESLA values for sensor 73 for each pas rating ##
## (251,)
plt.figure()
plt.plot(times, mean_pas1.T, color = "blue", label = "pas1")
plt.plot(times, mean_pas2.T, color = "green", label = "pas2")
plt.plot(times, mean_pas3.T, color = "orange", label = "pas3")
plt.plot(times, mean_pas4.T, color = "red", label = "pas4")
plt.axvline(color = "black")
plt.axhline(color = "black")

plt.legend()

plt.title("1.2ii: Average Magnetic Field Across Repetitions for Sensor 73")
plt.xlabel("Time (ms)")
plt.ylabel("Magnetic Field (Tesla)")

plt.show()

EXERCISE 2 - Do logistic regression to classify pairs of PAS-ratings

1) Now, we are going to do Logistic Regression with the aim of classifying the PAS-rating given by the subject

(DB) i. We’ll start with a binary problem - create a new array called data_1_2 that only contains PAS responses 1 and 2. Similarly, create a y_1_2 for the target vector

## CREATING a new array ##
pas1_2= np.where((y==1)|(y==2))
data_1_2=data[pas1_2]

data_1_2.shape #checking that we did it right

## target vector ##
## (214, 102, 251)
y_1_2 = y[(y==1)|(y==2)] 

ii. (MA) Scikit-learn expects our observations (data_1_2) to be in a 2d-array, which has samples (repetitions) on dimension 1 and features (predictor variables) on dimension 2. Our data_1_2 is a three-dimensional array. Our strategy will be to collapse our two last dimensions (sensors and time) into one dimension, while keeping the first dimension as it is (repetitions). Use np.reshape to create a variable X_1_2 that fulfils these criteria.

## RESHAPING to create X_1_2 ##
X_1_2 = data_1_2.reshape(data_1_2.shape[0],-1) 

X_1_2.shape
## (214, 25602)

data_1_2.shape[0] is repetitions with pas1 or pas2. This is the dimension we want to keep. -1 collapses the second (sensors) and third (time) dimension.

(ADS) iii. Import the StandardScaler and scale X_1_2

from sklearn.preprocessing import StandardScaler 
scaler = StandardScaler()

## SCALING X_1_2 ##
X_1_2_scaled = scaler.fit_transform(X_1_2) 
X_1_2_scaled.shape
## (214, 25602)

(MA) iv. Do a standard LogisticRegression - can be imported from sklearn.linear_model - make sure there is no penalty applied

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(penalty='none')

## FITTING a logistic regression to our scaled data ## 
lr.fit(X_1_2_scaled,y_1_2)
## LogisticRegression(penalty='none')

(DB) v. Use the score method of LogisticRegression to find out how many labels were classified correctly. Are we overfitting? Besides the score, what would make you suspect that we are overfitting?

lr.score(X_1_2_scaled, y_1_2)
## 1.0

Our logistic regression correctly classifies 100% of the labels (as the output is 1.0), which makes us suspect that something is wrong. Considering this perfect accuracy, the model is clearly overfitting to the data. This may have happened since we have not split the data into test and training sets, so the model “knows” the correct classifications as it has been trained on the very data it is asked to predict.

(ADS) vi. Now apply the L1 penalty instead - how many of the coefficients (.coef_) are non-zero after this?

## APPPLYING L1 ## 
lr = LogisticRegression(penalty='l1', solver ='liblinear', random_state=1)
lr.fit(X_1_2_scaled,y_1_2)

## COUNTING the non-zero coefficients ##
## LogisticRegression(penalty='l1', random_state=1, solver='liblinear')
np.count_nonzero(lr.coef_) 
## 282

(MS DB) vii. Create a new reduced \(X\) that only includes the non-zero coefficients - show the covariance of the non-zero features (two covariance matrices can be made; \(X_{reduced}X_{reduced}^T\) or \(X_{reduced}^TX_{reduced}\) (you choose the right one)) . Plot the covariance of the features using plt.imshow. Compared to the plot from 1.1.iii, do we see less covariance?

(MS) Firstly, we subset reduced X to only include non-zero coefficients:

## REDUCED X ##
non_zero = np.where(lr.coef_ != 0)

## TAKING the previous X_1_2_scaled and subsetting the non-zero coefficients
reduced_X = X_1_2_scaled[:,non_zero[1]]

reduced_X.shape
## (214, 282)

Now we make the matrix. We would like to end up with a 282 by 282 matrix. When we multiply two matrices, we get the first matrix’ rows and the second matrix’ columns. We choose the formula where the rows of the transposed matrix (282) is multiplied with the columns of the matrix (282): \(X_{reduced}^T X_{reduced}\) or \(features \times features\).

(DB):

## COVARIANCE of Non-zero Features ##
reduced_X_cov = reduced_X.T@reduced_X
reduced_X_cov.shape

## PLOTTING the covariance plot ##
## (282, 282)
plt.figure()
plt.imshow(reduced_X_cov)
plt.title("2.1vii: Non-Zero Coefficients Covariance Matrix")
plt.colorbar()
## <matplotlib.colorbar.Colorbar object at 0x12a693730>
plt.show()

## PLOTTING 1.1iii (scaled) ##

plt.figure()
plt.imshow(cov_scaled, vmin=-200, vmax=200) 
plt.title("1.1iii: Scaled Sensor Covariance Matrix")
plt.colorbar()
## <matplotlib.colorbar.Colorbar object at 0x12a63b3d0>
plt.show()

Compared to the scaled plot from 1.1iii, we see less covariance in the 2.1vii plot. This is due to our selection of only non-zero features. We have removed all the features which do not contribute to explaining more variance. In a way, this is the same as keeping as many variables as possible which explain different parts of the variance. For this reason, we expect and see less covariance in the non-zero plot.

2) Now, we are going to build better (more predictive) models by using cross-validation as an outcome measure

(GROUP) i. Import cross_val_score and StratifiedKFold from sklearn.model_selection

from sklearn.model_selection import cross_val_score, StratifiedKFold

(DB) ii. To make sure that our training data sets are not biased to one target (PAS) or the other, create y_1_2_equal, which should have an equal number of each target. Create a similar X_1_2_equal. The function equalize_targets_binary in the code chunk associated with Exercise 2.2.ii can be used. Remember to scale X_1_2_equal!

# Exercise 2.2.ii
def equalize_targets_binary(data, y):
    np.random.seed(7)
    targets = np.unique(y) ## find the number of targets
    if len(targets) > 2:
        raise NameError("can't have more than two targets")
    counts = list()
    indices = list()
    for target in targets:
        counts.append(np.sum(y == target)) ## find the number of each target
        indices.append(np.where(y == target)[0]) ## find their indices
    min_count = np.min(counts)
    # randomly choose trials
    first_choice = np.random.choice(indices[0], size=min_count, replace=False)
    second_choice = np.random.choice(indices[1], size=min_count,replace=False)
    
    # create the new data sets
    new_indices = np.concatenate((first_choice, second_choice))
    new_y = y[new_indices]
    new_data = data[new_indices, :, :]
    
    return new_data, new_y
## USING the function ##
data_1_2_equal, y_1_2_equal = equalize_targets_binary(data_1_2, y_1_2)

## INVESTIGATING y_1_2_equal & data_1_2_equal ##
y_1_2_equal.shape
## (198,)
data_1_2_equal.shape

## RESHAPING ## 
## (198, 102, 251)
X_1_2_equal = data_1_2_equal.reshape(data_1_2_equal.shape[0],-1) 
X_1_2_equal.shape

## SCALING ##
## (198, 25602)
scaler = StandardScaler()
X_1_2_equal_scaled = scaler.fit_transform(X_1_2_equal)

## CHECKING that we did it right ## 
X_1_2_equal_scaled.shape
## (198, 25602)

By running X_1_2_equal_scaled.shape, we see that we have now 198 pas ratings (corresponding to 99 pas1 and 99 pas2 ratings) and the 25602 coefficients.

(MS) iii. Do cross-validation with 5 stratified folds doing standard LogisticRegression (See Exercise 2.1.iv)

## FITTING the model ##
log = LogisticRegression(penalty='none')

log.fit(X_1_2_equal_scaled, y_1_2_equal)

## CROSS-VALIDATION ## 
## LogisticRegression(penalty='none')
cv = StratifiedKFold(n_splits=5) #defining that we want 5 folds 

scores_log = cross_val_score(log, X_1_2_equal_scaled, y_1_2_equal, cv=cv)
print("Cross-Validation Score (no penality):", round(np.mean(scores_log), 3))
## Cross-Validation Score (no penality): 0.535

Our model is barely above chance level (0.535) when you take the mean of the prediction accuracy from the 5 cv fits.

(DB, MS) iv. Do L2-regularisation with the following Cs= [1e5, 1e1, 1e-5]. Use the same kind of cross-validation as in Exercise 2.2.iii. In the best-scoring of these models, how many more/fewer predictions are correct (on average)?

(DB):

#LOG 1e5 
log_1e5 = LogisticRegression(penalty='l2', C=1e5)
log_1e5.fit(X_1_2_equal_scaled, y_1_2_equal)
## LogisticRegression(C=100000.0)
scores_log_1e5 = cross_val_score(log_1e5, X_1_2_equal_scaled, y_1_2_equal, cv=cv)

#LOG 1e1
log_1e1 = LogisticRegression(penalty='l2', C=1e1)
log_1e1.fit(X_1_2_equal_scaled, y_1_2_equal)
## LogisticRegression(C=10.0)
scores_log_1e1 = cross_val_score(log_1e1, X_1_2_equal_scaled, y_1_2_equal, cv=cv)

#LOG 1e-5
log_1e_neg5 = LogisticRegression(penalty='l2', C=1e-5) 
log_1e_neg5.fit(X_1_2_equal_scaled, y_1_2_equal)
## LogisticRegression(C=1e-05)
scores_log_1e_neg5 = cross_val_score(log_1e_neg5, X_1_2_equal_scaled, y_1_2_equal, cv=cv)
## CROSS-VALIDATION Score ## 
print("Cross-Validation Score w. C = 1e5:", np.mean(scores_log_1e5).round(3))
## Cross-Validation Score w. C = 1e5: 0.535
print("Cross-Validation Score w. C = 1e1:", np.mean(scores_log_1e1).round(3))
## Cross-Validation Score w. C = 1e1: 0.525
print("Cross-Validation Score w. C = 1e-5:", np.mean(scores_log_1e_neg5).round(3))
## Cross-Validation Score w. C = 1e-5: 0.596

The model with the lowest C (\(C = 1e-5\)) has the highest accuracy (0.596) meaning that introducing penalty/bias improves prediction accuracy. We can count the amount of predictions that are correct and compare with the model with no penalty made in 2.2iii (this model only had an accuracy of 0.535).

(MS):

from sklearn.model_selection import cross_val_predict

## ACCURACY Log 1e-5 (best scoring) ##
predict_log1e_neg5 = cross_val_predict(log_1e_neg5, X_1_2_equal_scaled, y_1_2_equal, cv=cv)

accuracy_log1e_neg5 = predict_log1e_neg5 == y_1_2_equal # list of all true & false classifications 

## ACCURACY Log Model 2.2iii ##
predict_log = cross_val_predict(log, X_1_2_equal_scaled, y_1_2_equal, cv=cv)

accuracy_log = predict_log == y_1_2_equal

## COUNTING correct predictions ## 
print("Correct Predictions Log 2.2iii:", len(np.where(accuracy_log == True)[0]))
## Correct Predictions Log 2.2iii: 106
print("Correct Predictions Log 1e-5:", len(np.where(accuracy_log1e_neg5 == True)[0]))
## Correct Predictions Log 1e-5: 118

The amount of predictions that are correct with the penalized model (log 1e-5) are 118 whereas the non-penalised model (from 2.2iii) only has 106.

(ADS) v. Instead of fitting a model on all n_sensors * n_samples features, fit a logistic regression (same kind as in Exercise 2.2.iv (use the C that resulted in the best prediction)) for each time sample and use the same cross-validation as in Exercise 2.2.iii. What are the time points where classification is best? Make a plot with time on the x-axis and classification score on the y-axis with a horizontal line at the chance level (what is the chance level for this analysis?)

## EMPTY list for the cross scores ##
cross_scores = []

## DEFINING cv again for good measure ## 
cv = StratifiedKFold(n_splits=5)

for i in range(251):
  #Creating data and scaling 
  scaler = StandardScaler()
  X_time = data_1_2_equal[:,:,i]
  X_time_scaled = scaler.fit_transform(X_time)
  
  #Creating a logistic regression object
  lr = LogisticRegression(penalty='l2', C=1e-5)
  
  #Cross-validating 
  score = cross_val_score(lr, X_time_scaled, y_1_2_equal, cv = cv)
  
  #taking the mean 
  mean = np.mean(score)
  
  #appending the mean
  cross_scores.append(mean)
  
## FINDING the time point where classification is best ##
indexmax = cross_scores.index(max(cross_scores))
times[indexmax]
## 232
plt.figure()
plt.axvline(x = times[indexmax], color = "black", alpha = 0.5)  
plt.plot(times, cross_scores)
plt.axhline(y = 0.50, color = "black")
plt.title("L2 PAS 1 & 2: Classification Accuracy vs. Time")
plt.xlabel("Time (ms)")
plt.ylabel("Accuracy")
plt.show()

The chance level for this analysis is 50% since it is a binary classification where it can only classify pas-ratings as \(pas1\) or \(pas2\). Hence, the classification is not much better than chance level.

(DB) vi. Now do the same, but with L1 regression - set C=1e-1 - what are the time points when classification is best? (make a plot)?

## EMPTY list for the cross scores ##
cross_scores_l1 = []

## DEFINING cv again for good measure ## 
cv = StratifiedKFold(n_splits=5)

for i in range(251):
  #Creating data and scaling 
  scaler = StandardScaler()
  X_time = data_1_2_equal[:,:,i]
  X_time_scaled = scaler.fit_transform(X_time)
  
  #Creating a logistic regression object
  lr = LogisticRegression(penalty='l1', solver = "liblinear", C=1e-1)
  
  #Cross-validating 
  score = cross_val_score(lr, X_time_scaled, y_1_2_equal, cv = cv)
  
  #taking the mean 
  mean = np.mean(score)
  
  #appending the mean
  cross_scores_l1.append(mean)
  
## FINDING the time point where classification is best ##
indexmax_l1 = cross_scores_l1.index(max(cross_scores_l1))
times[indexmax_l1]
## 244
plt.figure()
plt.axvline(x = times[indexmax_l1], color = "black", alpha = 0.5)  
plt.plot(times, cross_scores_l1)
plt.axhline(y = 0.50, color = "black")
plt.title("L1 PAS 1 & 2: Classification Accuracy vs. Time")
plt.xlabel("Time (ms)")
plt.ylabel("Accuracy")
plt.show()

For both the L1 and L2 plot, the time point where classification is best is around 230-40 ms.

(MA, MS) vii. Finally, fit the same models as in Exercise 2.2.vi but now for data_1_4 and y_1_4 (create a data set and a target vector that only contains PAS responses 1 and 4). What are the time points when classification is best? Make a plot with time on the x-axis and classification score on the y-axis with a horizontal line at the chance level (what is the chance level for this analysis?)

(MA): Firstly, we will create a target vector with pas1 and pas4 similar to the one we made in exercise 2.1i

pas1_4 = np.where((y==1)|(y==4))
data_1_4 = data[pas1_4]

## CREATING the target vector ##
y_1_4 = y[(y==1)|(y==4)] 

Now we equalize the data:

## USING the function from 2.2ii ##
data_1_4_equal, y_1_4_equal = equalize_targets_binary(data_1_4, y_1_4)

Finally, we cross-validate for each time stamp:

## EMPTY list for the cross scores ##
cross_scores_pas1_4 = []

## DEFINING cv again for good measure ## 
cv = StratifiedKFold(n_splits=5)

for i in range(251):
  #Creating data and scaling 
  scaler = StandardScaler()
  X_time = data_1_4_equal[:,:,i]
  X_time_scaled = scaler.fit_transform(X_time)
  
  #Creating a logistic regression object
  lr = LogisticRegression(penalty='l1', solver = "liblinear", C=1e-1)
  
  #Cross-validating 
  score = cross_val_score(lr, X_time_scaled, y_1_4_equal, cv = cv)
  
  #taking the mean 
  mean = np.mean(score)
  
  #appending the mean
  cross_scores_pas1_4.append(mean)
  

(MS):

## FINDING the time point where classification is best ##
indexmax_pas_1_4 = cross_scores_pas1_4.index(max(cross_scores_pas1_4))
times[indexmax_pas_1_4]
## 236
plt.figure()
plt.axvline(x = times[indexmax_pas_1_4], color = "black", alpha = 0.5)  
plt.plot(times, cross_scores_pas1_4)
plt.axhline(y = 0.50, color = "black")
plt.title("L1 PAS 1 & 4: Classification Accuracy vs. Time")
plt.xlabel("Time (ms)")
plt.ylabel("Accuracy")
plt.show()

The chance level for this analysis is again 50% as it is once again a binary classification.

The time point where the classification is the best is 236ms for the L1 PAS1 & PAS4 logistic regression which is quite close to the previous models.

(MA) 3) Is pairwise classification of subjective experience possible? Any surprises in the classification accuracies, i.e. how does the classification score fore PAS 1 vs 4 compare to the classification score for PAS 1 vs 2?

Theoretically, there should be a greater difference between PAS1 and PAS4 compared to the difference between PAS1 and PAS2 considering the meaning of these ratings as a great difference in experience. However, our pairwise comparisons in exercise 2.2 give quite similar results.

Another surprise has been the accuracy of our models in general which are close to chance level.

EXERCISE 3 - Do a Support Vector Machine Classification on all four PAS-ratings

1) Do a Support Vector Machine Classification

` #### (MA) i. First equalize the number of targets using the function associated with each PAS-rating using the function associated with Exercise 3.1.i

def equalize_targets(data, y):
    np.random.seed(7)
    targets = np.unique(y)
    counts = list()
    indices = list()
    for target in targets:
        counts.append(np.sum(y == target))
        indices.append(np.where(y == target)[0])
    min_count = np.min(counts)
    first_choice = np.random.choice(indices[0], size=min_count, replace=False)
    second_choice = np.random.choice(indices[1], size=min_count, replace=False)
    third_choice = np.random.choice(indices[2], size=min_count, replace=False)
    fourth_choice = np.random.choice(indices[3], size=min_count, replace=False)
    
    new_indices = np.concatenate((first_choice, second_choice,
                                 third_choice, fourth_choice))
    new_y = y[new_indices]
    new_data = data[new_indices, :, :]
    
    return new_data, new_y
data_all_pas_eq, all_pas_y_eq = equalize_targets(data, y)

(MS) ii. Run two classifiers, one with a linear kernel and one with a radial basis (other options should be left at their defaults) - the number of features is the number of sensors multiplied the number of samples. Which one is better predicting the category?

Firstly, we reshape and scale our data:

## RESHAPING ##
X_all_eq = data_all_pas_eq.reshape(data_all_pas_eq.shape[0],-1) 
X_all_eq.shape

## SCALING ##
## (396, 25602)
scaler = StandardScaler()
X_all_eq_scaled = scaler.fit_transform(X_all_eq)
X_all_eq_scaled.shape
## (396, 25602)
from sklearn.svm import SVC

## CREATING the classifier with a linear kernel ## 
Lsvc = SVC(kernel = 'linear')

## CREATING the classifier with a radial basis ## 
Rsvc = SVC(kernel = 'rbf')

To compare the prediction accuracy of the two classifiers, we cross-validate them:

## DEFINING cv again for good measure ## 
cv = StratifiedKFold(n_splits=5)

## CROSS-VALIDATING the linear support vector ##
Lsvc_scores = cross_val_score(Lsvc, X_all_eq_scaled, all_pas_y_eq, cv=cv)

## CROSS-VALIDATING the radial support vector ##
Rsvc_scores = cross_val_score(Rsvc, X_all_eq_scaled, all_pas_y_eq, cv=cv)

## printing the mean of the cross-validated performances ## 
print("Lsvc Mean Cross Validated:", round(np.mean(Lsvc_scores), 3))
## Lsvc Mean Cross Validated: 0.293
print("Rsvc Mean Cross Validated:", round(np.mean(Rsvc_scores), 3))
## Rsvc Mean Cross Validated: 0.333

The mean accuracy is higher for our support vector machine with a radial basis kernel (0.333) although both accuracies are quite low (slightly above chance level at 25 %).

(ADS, MA) iii. Run the sample-by-sample analysis (similar to Exercise 2.2.v) with the best kernel (from Exercise 3.1.ii). Make a plot with time on the x-axis and classification score on the y-axis with a horizontal line at the chance level (what is the chance level for this analysis?)

(ADS):

## EMPTY list for the cross scores ##
cross_scores_Rsvc = []

## DEFINING cv again for good measure ## 
cv = StratifiedKFold(n_splits=5)

for i in range(251):
  #Creating data and scaling 
  scaler = StandardScaler()
  X_time = data_all_pas_eq[:,:,i]
  X_time_scaled = scaler.fit_transform(X_time)
  
  #Instantiating a support vector machine with a radial basis
  Rsvc = SVC(kernel = 'rbf')
  
  #Cross-validating 
  score = cross_val_score(Rsvc, X_time_scaled, all_pas_y_eq, cv = cv)
  
  #taking the mean 
  mean = np.mean(score)
  
  #appending the mean
  cross_scores_Rsvc.append(mean)

(MA):

## FINDING the time point where classification is best ##
indexRsvc = cross_scores_Rsvc.index(max(cross_scores_Rsvc))
times[indexRsvc]
## 704
plt.figure()
plt.axvline(x = times[indexRsvc], color = "black", alpha = 0.7)  
plt.axhline(y = 0.25, color = "black") #horizontal line at chance level 
plt.plot(times, cross_scores_Rsvc)
plt.title("SVC with Radial Basis on all PAS: Classification Accuracy vs. Time")
plt.xlabel("Time (ms)")
plt.ylabel("Accuracy")
plt.show()

Since it is a multinomial classification with 4 categories, the chance level is 25 %.

(ADS) iv. Is classification of subjective experience possible at around 200-250 ms?

Looking at the plot from 3.1iii visually, we note that classification of PAS ratings at around 200-250 is above chance level. Hence it could be argued that a classification of subjective experience is possible at this time interval despite far from perfectly accurate.

(ADS) 2) Finally, split the equalized data set (with all four ratings) into a training part and test part, where the test part if (*is) 30 % of the trials. Use train_test_split from sklearn.model_selection

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_all_eq_scaled, all_pas_y_eq, test_size=0.30, random_state=42) #we save to four different variables, the random_state ensures that it randomly sorts the data-set into train and test 

(DB) i. Use the kernel that resulted in the best classification in Exercise 3.1.ii and fitthe training set and predict on the test set. This time your features are the number of sensors multiplied by the number of samples.

## INSTANTIATING kernel ##
Rsvc = SVC(kernel = 'rbf')

## FITTING ##
Rsvc.fit(X_train, y_train)

## PREDICTING ##
## SVC()
predicted_y = Rsvc.predict(X_test)
acc = list(y_test == predicted_y)

## PROPORTION of correctly predicted pas-scores ##
print("Proportion of correctly predicted pas:", round(acc.count(True)/len(acc), 3))
## Proportion of correctly predicted pas: 0.277

Performance is just above chance level (0.277).

(MA) ii. Create a confusion matrix. It is a 4x4 matrix. The row names and the column names are the PAS-scores. There will thus be 16 entries. The PAS1xPAS1 entry will be the number of actual PAS1, \(y_{pas1}\) that were predicted as PAS1, \(\hat y_{pas1}\). The PAS1xPAS2 entry will be the number of actual PAS1, \(y_{pas1}\) that were predicted as PAS2, \(\hat y_{pas2}\) and so on for the remaining 14 entries. Plot the matrix

## CONFUSION matrix ##
import pandas as pd
predicted_y = pd.Series(predicted_y, name = 'Predicted PAS')
y_test = pd.Series(y_test, name = 'Actual PAS')

print(pd.crosstab(y_test, predicted_y))
## Predicted PAS  1  2   3   4
## Actual PAS                 
## 1              4  5  16  12
## 2              1  8  12   7
## 3              1  4  14   7
## 4              2  5  14   7

(MA, DB) iii. Based on the confusion matrix, describe how ratings are misclassified and if that makes sense given that ratings should measure the strength/quality of the subjective experience. Is the classifier biased towards specific ratings?

(MA): We see an exaggerated representation in the 3rd column (where pas is predicted to be 3), indicating a bias. No matter the actual pas-rating, the model will most frequently predict the pas-rating to be 3 (for each row, the highest value is in the 3rd column). Misclassifications are most frequent when the actual pas-rating was 1 where 89% of trials are misclassified (see calculations below).

(DB): We imagine that pas3 (almost clear experience) is the “broadest” of the rating definitions thereby allowing participants to classify a broader variety of experiences as pas3 (contingent on their definition of “almost”). We could therefore expect that pas3 ratings are more scattered across the multidimensional space, making them harder for the SVM to isolate pas3 from the other ratings. Thus, pas3 is the obvious/safe guess in most cases for our model (which only performs slightly better than chance).

#MISCLASSIFICATIONS 
print("Pas1 Misclassification:", round((5+16+12)/(4+5+16+12)*100, 3)) #pas1
## Pas1 Misclassification: 89.189
print("Pas2 Misclassification:", round((1+12+7)/(1+8+12+7)*100, 3)) #pas2
## Pas2 Misclassification: 71.429
print("Pas3 Misclassification:", round((1+5+7)/(1+4+14+7)*100, 3)) #pas3
## Pas3 Misclassification: 50.0
print("Pas4 Misclassification:", round((2+5+14)/(2+5+14+7)*100, 3)) #pas4
## Pas4 Misclassification: 75.0